Multivariate Statistical Tests for Comparing Classification Algorithms

نویسندگان

  • Olcay Taner Yildiz
  • Özlem Aslan
  • Ethem Alpaydin
چکیده

The misclassification error which is usually used in tests to compare classification algorithms, does not make a distinction between the sources of error, namely, false positives and false negatives. Instead of summing these in a single number, we propose to collect multivariate statistics and use multivariate tests on them. Information retrieval uses the measures of precision and recall, and signal detection uses true positive rate (tpr) and false positive rate (fpr) and a multivariate test can also use such two values instead of combining them in a single value, such as error or average precision. For example, we can have bivariate tests for (precision, recall) or (tpr, fpr). We propose to use the pairwise test based on Hotelling’s multivariate T 2 test to compare two algorithms or multivariate analysis of variance (MANOVA) to compare L > 2 algorithms. In our experiments, we show that the multivariate tests have higher power than the univariate error test, that is, they can detect differences that the error test cannot, and we also discuss how the decisions made by different multivariate tests differ, to be able to point out where to use which. We also show how multivariate or univariate pairwise tests can be used as post-hoc tests after MANOVA to find cliques of algorithms, or order them along separate dimensions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multivariate Comparison of Classification Algorithms

Statistical tests that compare classification algorithms are univariate and use a single performance measure, e.g., misclassification error, F measure, AUC, and so on. In multivariate tests, comparison is done using multiple measures simultaneously. For example, error is the sum of false positives and false negatives and a univariate test on error cannot make a distinction between these two sou...

متن کامل

A Clustering Based Location-allocation Problem Considering Transportation Costs and Statistical Properties (RESEARCH NOTE)

Cluster analysis is a useful technique in multivariate statistical analysis. Different types of hierarchical cluster analysis and K-means have been used for data analysis in previous studies. However, the K-means algorithm can be improved using some metaheuristics algorithms. In this study, we propose simulated annealing based algorithm for K-means in the clustering analysis which we refer it a...

متن کامل

Assessing and Comparing Classification Algorithms

Ethem Alpaydın Department of Computer Engineering Boğaziçi University TR-80815 Istanbul, Turkey [email protected] Abstract Machine learning algorithms induce classifiers that depend on the training set and hyperparameters and there is a need for statistical testing for (i) assessing the expected error rate of a classifier, and (ii) comparing the expected error rates of two classifiers. We re...

متن کامل

Comparing pixel-based and object-based algorithms for classifying land use of arid basins (Case study: Mokhtaran Basin, Iran)

In this research, two techniques of pixel-based and object-based image analysis were investigated and compared for providing land use map in arid basin of Mokhtaran, Birjand. Using Landsat satellite imagery in 2015, the classification of land use was performed with three object-based algorithms of supervised fuzzy-maximum likelihood, maximum likelihood, and K-nearest neighbor. Nine combinations...

متن کامل

Rough Set Approach to Multivariate Decision Trees Inducing

Aimed at the problem of huge computation, large tree size and over-fitting of the testing data for multivariate decision tree (MDT) algorithms, we proposed a novel roughset-based multivariate decision trees (RSMDT) method. In this paper, the positive region degree of condition attributes with respect to decision attributes in rough set theory is used for selecting attributes in multivariate tes...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011